Extracting time series matching a small-angle X-ray scattering profile from trajectories of molecular dynamics simulations

Solving structural ensembles of flexible biomolecules is a challenging research area. Here, we propose a method to obtain possible structural ensembles of a biomolecule based on small-angle X-ray scattering (SAXS) and molecular dynamics simulations. Our idea is to clip a time series that matches a SAXS profile from a simulation trajectory. To examine its practicability, we applied our idea to a multi-domain protein ER-60 and successfully extracted time series longer than 1 micro second from trajectories of coarse-grained molecular dynamics simulations. In the extracted time series, the domain conformation was distributed continuously and smoothly in a conformational space. Preferred domain conformations were also observed. Diversity among scattering curves calculated from each ER-60 structure was interpreted to reflect an open-close motion of the protein. Although our approach did not provide a unique solution for the structural ensemble of the biomolecule, each extracted time series can be an element of the real behavior of ER-60. Considering its low computational cost, our approach will play a key role to identify biomolecular dynamics by integrating SAXS, simulations, and other experiments.


Supplementary
WP, Lt,  2 , and KLSAS-CLIP of other time series obtained by the SAS-CLIP    The color in heatmaps shows appearance probability in the (a-b-b', b-b'-a')-space. 8

Fig. S6. Distributions of  2 value of individual CGMD snapshots
Distribution histogram is shown for the time series #A-1, #A-2, and #A3.    Table. S1). The color shows the appearance probability in Distributions are shown for the six structural series, from #B-1 to #B-6. The color shows the appearance Distributions are shown for the six structural series. The color shows the appearance probability in the

Detail of atomistic MD simulations
The initial structures were prepared based on the crystal structure of ER-60 C60A mutant (PDBID: 3F8U) 1 with two modifications. First, the mutated residue was changed to cysteine using water molecules, 86 Na + ions, and 80 Clions. The Amber ff14SB force field and TIP3P water model 3,4 were employed. The electrostatic interactions were calculated with the particle-mesh Ewald method 5 .
The production runs were preceded by energy minimization by the steepest descent algorithm, equilibration under NVT condition at 300 K for 100 ps, and equilibration under NPT condition at 300K, with 1 bar for 1 ns. During the equilibrations, the coordinates of all heavy atoms were restrained to their initial positions. For the systems b-b' and a', the production runs were performed under NPT condition at 300K and 1 bar for 500 ns. For the system a, the production run was performed under NPT condition at 300K and 1 bar for 1,000 ns because of the large conformational change in the CGHC motif of a domain just before t = 500 ns (Fig. S2). In the simulations, except for the energy minimization, all bonds with hydrogen atoms in protein were constrained by LINCS, and water molecules are constrained by

Detail of coarse-grained (CG) MD simulations
CGMD simulations were performed using Martini 3 open-beta version 10 . The simulation system contains a single ER-60 molecule, 44,531 water beads, 537 Na + ions, and 527 Clions.
To maintain the folded structure, an elastic network potential is also defined between backbone beads that are more than three amino acids apart in the sequence. The residue pair to define the elastic network potential was determined based on the atomistic MD simulations as follows. First, each snapshot of the atomistic MD simulations was mapped to the corresponding CG model. Second, the elastic network potential was defined for amino acid pairs satisfying the following conditions: average distance of their backbone-bead pair < 9.0 Å and standard deviation of the distance < 0.5 Å. These calculations were done for the last 90% of each trajectory to avoid possible bias of the initial structure. We noted that fluctuation of the ER-60 fragment at their terminus may differ from that of the full-length ER-60. The elastic network potential of amino acids related to Gln 131 , Ala 132 , Gly 133 , Pro 134 , Tyr 364 , Leu 365 , and Lys 366 were defined based on the crystal structure. In the latter case, the elastic network was defined when a distance between two backbone beads was < 9.0 Å. The force constant of the elastic network model was 500 kJmol -1 nm -2 as described previously 11 .
The initial structure of CGMD simulations was prepared by connecting representative structures of the a, b-b', and a' domains in corresponding atomistic simulations. Each representative structure was selected as a structure that gives the smallest elastic-network potential energy. Finally, the C-terminal region was generated by PyMOL.
The topology file of CGMD simulations was generated by Martinize2 12 . The scfix option was utilized to fix sidechains 13 . To ensure the flexibility of hinge between folded regions and the C-terminal loop, scfix was turned off for Pro 134 , Tyr 364 , Leu 365 , Lys 366 , and the C-terminal loop.
The production runs were preceded by two relaxation steps. First, energy minimization by the steepest descent algorithm was performed. Second, equilibration under NPT condition was performed at 300K and 1 bar for 1 ns with velocity-rescaling temperature coupling algorithm and Berendsen's pressure coupling algorithm 14

Detail of reconstruction of atomistic models.
First, we used the software backward 15 and reverse-mapped all simulation snapshots. However, there were snapshots where backward calculations did not converge. For them, atomistic structures were reconstructed using the other atomistic models already obtained with the backward. The reconstruction procedure was as follows. This was done after performing backward for all snapshots.

[Preparation 1]
We defined a structure pool. This pool included pairs of a CG model and corresponding atomistic model obtained by the backward. The members of the pool were restricted to snapshots belonged to the same simulation trajectory as we wanted to reverse-map. The reason for the restriction is to reduce computation time.

[Preparation 2]
Full-length ER-60 was considered as a set of 28 parts. The reverse-mapping was done for each part separately. [Reverse-mapping] For a structure we wanted to reverse-map (here we call "target"), the most similar structure was searched for in the structure pool and superposed. The model search and superposition were performed for each of 28 parts separately. The detailed procedure was as follows: Step1. A model pair (a CG model and corresponding atomistic model) was extracted from the structure pool.
Step2. The extracted model pair was superposed on the target using SC1 and BB particles in the CG model. The region to superpose depended on type of the part.

Globular part
The region to superpose was the same as the part. For example, the SC1 and BB particles in the region Ser 25 -Ala 132 were superposed when reverse-map the part Ser 25 -Ala 132 .

Leu 505
The SC1 and BB particles in Asp 504 -Leu 505 were superposed. The reason for using a larger region than that to be reverse-mapped is to reverse-map backbone of flexible loop more precisely.

The other parts
The SC1 and BB particles in the part and its neighboring amino acids were superposed. For example, the SC1 and BB particles in the region Glu 503 -Leu 505 were superposed when reverse-map the part Asp 504 .